Add replace command with Calcite#4248
Add replace command with Calcite#4248manasvinibs wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
| @Override | ||
| public LogicalPlan visitReplace(Replace node, AnalysisContext context) { | ||
| throw new UnsupportedOperationException( | ||
| "Replace is supported only when " + CALCITE_ENGINE_ENABLED.getKeyValue() + "=true"); |
There was a problem hiding this comment.
There's a util for this nowadays:
| String patternStr = pattern.toString(); | ||
| if (patternStr.contains("+") | ||
| || patternStr.contains("-") | ||
| || patternStr.contains("*") | ||
| || patternStr.contains("/")) { | ||
| throw new IllegalArgumentException( | ||
| "Expression is not allowed in replace pattern. Only string literals are supported."); |
There was a problem hiding this comment.
A little confused by this logic, we don't allow strings that contain these characters? Or if we're trying to forbid any expressions at all, I'm not sure just the 4 arithmetic operators will be sufficient (e.g. if someone tries to boolean and or something).
If we don't want an expression, the grammar should specify a stringLiteral or something along those lines, instead of expression.
There was a problem hiding this comment.
+1 for @Swiddis 's question, and I also left a comment in this validation method in general.
There was a problem hiding this comment.
Thanks for pointing this! Yes the idea was to avoid any non-string literals for pattern text. I have updated the grammar token itself to string literal to handle this use case.
RyanL1997
left a comment
There was a problem hiding this comment.
Hi @manasvinibs , thanks for taking this on. And I just left some comments.
| } | ||
|
|
||
| private void validate() { | ||
| if (pattern == null) { |
There was a problem hiding this comment.
The validation logic for the pattern expression hardcodes checks for mathematical operators (+, -, *, /) on L44-47. This is may miss other invalid expressions in my opinion. Maybe consider validating that pattern is a string literal by checking its type (e.g., Literal with DataType.STRING)?
Something like this:
if (!(pattern instanceof Literal && ((Literal) pattern).getType() == DataType.STRING)) {
throw new IllegalArgumentException("Replace pattern must be a string literal.");
}There was a problem hiding this comment.
Good call! Updated in the next revision.
| String patternStr = pattern.toString(); | ||
| if (patternStr.contains("+") | ||
| || patternStr.contains("-") | ||
| || patternStr.contains("*") | ||
| || patternStr.contains("/")) { | ||
| throw new IllegalArgumentException( | ||
| "Expression is not allowed in replace pattern. Only string literals are supported."); |
There was a problem hiding this comment.
+1 for @Swiddis 's question, and I also left a comment in this validation method in general.
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Outdated
Show resolved
Hide resolved
docs/user/ppl/cmd/replace.rst
Outdated
|
|
||
| Description | ||
| ============ | ||
| | Using ``replace`` command to replace text in one or more fields in the search result. The replaced text appears in new fields with prefix *new_*. |
There was a problem hiding this comment.
Is it our intended behavior to introduce new attribute?
There was a problem hiding this comment.
Currently with this implementation I'm reusing existing eval replace function which also exposes the replaced values into new fields. I'm mimicking the same behavior with direct command. But, if we think instead of adding new field, we should replace on the original one I'm open to explore that option. Let me know your thoughts.
docs/user/ppl/cmd/replace.rst
Outdated
| ============ | ||
| replace '<pattern>' WITH '<replacement>' IN <field-name>[, <field-name>]... | ||
|
|
||
| * pattern: mandatory. The text pattern you want to replace. |
There was a problem hiding this comment.
Let's clarify that we currently support only plain text pattern. (pattern could indicate wildcard/regex/etc.)
docs/user/ppl/cmd/replace.rst
Outdated
| fetched rows / total rows = 1/1 | ||
| +-------------------------------+-------------------------------+ | ||
| | message | new_message | | ||
| |------------------------+--------------------------------------| |
There was a problem hiding this comment.
nit: I see broken table format.
| newFieldNames.add(fieldName); | ||
| } | ||
|
|
||
| // Then add new fields with replaced content using new_ prefix |
There was a problem hiding this comment.
What happens if new_xxx already exist?
There was a problem hiding this comment.
I tested this behavior and I see system is automatically handling the field name collision by appending a number ("new_country0") and resolving field name conflicts gracefully. I have documented this behavior.
ef65df6 to
2d17ce7
Compare
2d17ce7 to
be806e8
Compare
a20ff96 to
966b939
Compare
| context.relBuilder.call( | ||
| SqlStdOperatorTable.REPLACE, fieldRef, patternNode, replacementNode); | ||
| projectList.add(replaceCall); | ||
| newFieldNames.add(NEW_FIELD_PREFIX + fieldName); |
There was a problem hiding this comment.
Looks it does not check existing field and add suffix number as written in the doc.
There was a problem hiding this comment.
Yes every replace command adds a new_ column and do not conflict with existing column names. Let me know if you think we should change the behavior.
There was a problem hiding this comment.
I was wondering the following doc description is right since this logic doesn't check existence of the field. (Is it something automatically done?)
* If a field with *new_* prefix already exists (e.g., 'new_country'), a number will be appended to create a unique field name (e.g., 'new_country0')
ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Outdated
Show resolved
Hide resolved
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
Outdated
Show resolved
Hide resolved
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
Show resolved
Hide resolved
ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Manasvini B S <manasvis@amazon.com>
966b939 to
3621bc9
Compare
|
This PR is stalled because it has been open for 2 weeks with no activity. |
Description
Implement the replace command in PPL to replace text patterns in specified fields. This PR includes the grammar implementation and basic replacement functionality. This implementation reuses the existing replace functionality. Missing/new features for regex support is added in a separate PR.
Syntax
<source> | replace '<pattern>' WITH '<replacement>' IN field_listpattern: The text pattern to search for (case-sensitive)
replacement: The text to replace matches with
field_list: Comma-separated list of fields to perform replacement in
Replace creates new fields with prefix 'new_' containing the replaced text, while preserving original fields.
Semantics
Expected Behavior:
Implementation Approach:
Example Queries
Output Schema
For each field specified in IN clause, a new field is created:
Example:
Input fields: message="error occurred", level="error"
After replace 'error' WITH 'ERROR' IN message, level:
Related Issues
#3975
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.